Fast and exact out-of-core and distributed k-means clustering
نویسندگان
چکیده
منابع مشابه
Lightning fast asynchronous distributed k-means clustering
One of the most fundamental data processing approach is the clustering. This is even true in distributed architectures. Here, we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even orders of magnitude more than the numb...
متن کاملFast k-means algorithm clustering
k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge ...
متن کاملFast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D
The k-Means clustering problem on n points is NP-Hard for any dimension d ≥ 2, however, for the 1D case there exist exact polynomial time algorithms. Previous literature reported an O(kn) time dynamic programming algorithm that uses O(kn) space. We present a new algorithm computing the optimal clustering in only O(kn) time using linear space. For k = Ω(lg n), we improve this even further to n2 ...
متن کاملDistributed PCA and k-Means Clustering
This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering approach in [3], this leads to an algorithm...
متن کاملDistributed k-Means and k-Median Clustering on General Topologies
This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge and Information Systems
سال: 2005
ISSN: 0219-1377,0219-3116
DOI: 10.1007/s10115-005-0210-0